Community detection in signed networks based on discrete-time model
Chen Jianrui, Zhang Li, Liu Weiwei, Yan Zaizai
College of Science, Inner Mongolia University of Technology, Hohhot 010051, China

 

† Corresponding author. E-mail: jianrui_chen@sina.com

Abstract

Community detection in signed networks has been studied widely in recent years. In this paper, a discrete difference equation is proposed to imitate the consistently changing phases of the nodes. During the interaction, each node will update its phase based on the difference equation. Each node has many different nodes connected with it, and these neighbors have different influences on it. The similarity between two nodes is applied to describe the influences between them. Nodes with high positive similarities will get together and nodes with negative similarities will be far away from each other. Communities are detected ultimately when the phases of the nodes are stable. Experiments on real world and synthetic signed networks show the efficiency of detection performance. Moreover, the presented method gains better detection performance than two existing good algorithms.

1. Introduction

Community detection of complex networks has been a hot topic in the network study.[1] Many social, recommendation, and information systems can be naturally described as networks, where vertices represent entities and links denote relationships or interactions between the vertices.[2] The topological structure of many networks often features an organization of communities in which the nodes tend to divide into clusters, with dense connections within clusters and sparser connections between them.[3] There are many relationships between individuals in the real world, especially in social networks. Commonly, the social relationships between individuals have positive relationships and negative relationships.[4] It will be regarded as a positive link when two individuals have a friendly relationship, however, it will be regarded as a negative link when two individuals have a hostile relationship. Such networks are called signed networks.[5] The relationships between individuals also have different levels, the similarity can be defined by using the essential attributes of nodes: two nodes are considered to be similar if they have many common features.[6]

There are many algorithms proposed for community detections, such as KerNighanLin algorithm,[7] spectral bisection method,[8] Girvan–Newman algorithm,[9,10] etc. An agent-based heuristic algorithm (FEC) has been adopted to mine signed social networks in which both positive within-group relationships and negative between-group relationships are dense.[11] Two overlapping community detection algorithms have been introduced, one is the signed probabilistic mixture (SPM) model for overlapping community detection in signed networks,[12] and the other is the dominant label propagation algorithm (DLPA).[13] Dynamic algorithms or cluster synchronization[14] have been proposed in many papers. A dynamic community detection algorithm based on the phase synchronization of Kuramoto oscillators has been proposed.[15] It has put in negative coupling strength coefficients in order to make the phases of two unconnected oscillators in the network evolve far away, so the network can be divided into several clusters. Besides, several improved Kuramoto models have been introduced in Refs. [16]–[19]. Also, reference [20] studied the oscillatory behaviors of excitable complex networks (ECNs) and found some interesting dynamic behaviors of ECNs in oscillatory probability. The above mentioned work bridges the gap between the structure and the dynamics of networks. What is more, several algorithms based on the statistical measures for characterizing the degree correlation,[21] Potts spin model,[22,23] self-loop rescaling strategy,[24] local modularity,[25] and joint matrix factorization[26] also played an effective role in detecting communities. There are some optimization algorithms applied in community discovery, such as single objective optimization algorithm,[27] multi objective optimization algorithm,[28] etc.

In this paper, the discrete-time model is applied to solve the community detection problem of a signed network. Furthermore, a novel similarity is put forward to imitate the relationship between two nodes. The remainder of the paper is organized as follows. In section 2, the related work is introduced and the model is described. In section 3, the theoretical analysis is given. In section 4, the detailed algorithm is provided. In section 5, experimental results are demonstrated. Section 6 makes a summary of this work.

2. A discrete-time model based on similarity

In this paper, the signed network is considered, where V is the set of nodes, and W is the set of weights (similarity) . There is some literature about community detection of a signed network. An algorithm for a signed social network (DBAS) has been presented which is detailed dynamics-based.[29] The model is

(1)
Here N is the number of nodes in the positive social network, is the phase of the i-th node, is the positive coupling strength, and is the negative coupling strength. If the j-th node is positively connected with the i-th node, , if the j-th node is negatively connected with the i-th node, , otherwise .

In general, all the non-observed links are ranked according to their scores, the links connecting more similar nodes are supposed to be of higher existence likelihoods.[30] In this situation, many local similarity measures in networks are closely related to the concept of community structures.[31] We have found that the similarity between nodes should be considered. Dynamic mechanism with similarity would gain better detection performance than the model without similarity.[32]

In Ref. [33], a novel similarity measurement was defined according to the social balance theory[3436] for the signed network. Due to their different similarities, inner-community positive links were found and detected. A threshold has been given to decide a node's positive neighbors who have high similarity in magnitude with this node. However the algorithm in Ref. [33] has difficulty in detecting the community structure for the following two kinds of networks. (i) The total number of communities is large. (ii) The sizes of different communities are not balanced. To further improve the detection precision, we give the following similarity definition and network model.

In many applications, there are some nodes without direct connection but they still have similarities.[37] To handle this problem, a kind of similarity between two nodes is defined as

(2)
where N k is the number of the paths with k hops between nodes i and j, and m is the largest hops of the path between nodes i and j. In fact, if the i-th node and the j-th node are directly connected. N 1 is the number of common neighbors of nodes i and j, which is equal to the number of one-hop paths. Moreover, in Eq. (2) is the weight of the k-hop path to the similarity, it can be calculated by
where d is the average degree of the network.

In this paper, we consider the influences from both positive neighbors and negative neighbors. The similarity evaluation formula just takes the first two terms to reduce the computational cost, that is to say, only one-hop and two-hop paths have been considered. For the one-hop path, consider nodes i, j, k, here node k is the common neighbor of nodes i and j. If nodes i and j have node k as their common positive neighbor (shown in Fig. 1(a)), the similarity between i and j should be positive. If nodes i and j have node k as their common negative neighbor (shown in Fig. 1(b)), the similarity between i and j should be positive, too. In other words, we are friends if we have a common friend or common enemy. Thus the positive one-hop path is computed as

(3)
Here is the adjacency matrix. if there is a positive link between the i-th node and the j-th node; if there is a negative link between the i-th node and the j-th node; otherwise, if there is no link between the i-th node and the j-th node.

Fig. 1. (color online) Connections between one-hop path nodes. See text for the detailed description of panels (a)–(d).

However, if the k-th node is a positive neighbor of the i-th node and a negative neighbor of the j-th node (shown in Fig. 1(c)), there will be a negative influence between nodes i and j. If the k-th node is a negative neighbor of the i-th node and a positive neighbor of the j-th node (shown in Fig. 1(d)), there will also be a negative influence between nodes i and j. That is, the enemy of my friend is my enemy and the friend of my enemy is my enemy. Similarly, the negative one-hop path is computed as

(4)
Without loss of generality, for the two-hop path, consider nodes i, j, k, and h. There are eight kinds of situations, as shown in Fig. 2, and the different links denote different situations. Similar to the one-hop path, the positive and the negative two-hop paths can be defined as
(5)
(6)
Above all, the similarity between nodes i and j is defined as follows:
(7)
Furthermore by normalization, the similarity can be redefined as follows:
(8)
After calculating the similarity of nodes, we find that the probability density has a normal distribution, and it has the characteristic that most similarities are close to the mean. Here the synthetic signed network with 500 nodes is taken as an example, as shown in Fig. 3. To better grasp the different similarity distributions in different networks, upper and lower quantile α will be adopted as a threshold. The community structure of the signed networks has the following characteristics: there are more positive links within the communities and more negative links between communities. But the community detection of the signed networks often shows the phenomenon that there are more positive links between communities and more negative links within the communities. For this reason, the sets of positive and negative neighbors are determined by the upper and lower quantile α. The definition of α is
(9)
According to Eq. (9), the set of positive neighbors for node i is expressed as
The set of negative neighbors for node i is expressed as
In this paper, we further improve the formulation from Ref. [19] as follows:
(10)
Here and are predefined parameters, and . The initial phase is randomly and uniformly selected and distributed in . The parameter α is in the range .

Fig. 2. (color online) Connections between two-hop path nodes.
Fig. 3. (color online) Probability density estimate of similarity matrix .
3. Theoretical analysis

To simplify network model Eq. (10), the following transformation is adopted. Define

All positive values higher than in S are kept in and the other elements in are all zeros. In the same way, all negative values lower than 0 in S are kept in and the other elements in are all zeros. is a diagonal matrix and the diagonal element is defined as
is a diagonal matrix and the diagonal element is defined as
Furthermore, denote .

The network model is uniformly stable if the following matrix is negative semi-definite:

(11)
provided with suitable and .

The identical transformation of Eq. (10) can be deduced as follows:

In this paper, and the initial values are selected randomly from the interval , we can obtain . It can be observed that , where
The above formula can be further expressed as
Obviously,
Then
Denote . The matrix form can be written as
Define the Lyapunov function
then
Here, E is an N × N identity matrix. If and are selected to satisfy that the following matrix
is negative semi-definite. According to the Lyapunov stability theory,[38] the network model is uniformly stable.

The above theoretical analyses show that our proposed network model would be uniformly stable. With time evolving, the community structure would be formed according to the stable results.

Parameter analysis In fact, it can be found that the stable condition in Theorem 1 can be held with suitable and . We give further demonstration.

Denote The diagonal elements of are

If suitable and are provided, each diagonal element in would satisfy the following condition:
The diagonal elements in and are equal to 0 due to the diagonal elements in the similarity matrix being equal to 0. In this case, For arbitrarily , can satisfy the condition of strictly diagonally dominant matrix
It can be found that is invertible. Besides, the linear equation merely has zero solution. This means (m is any given negative number) is not the eigenvalue of , all eigenvalues of are not positive numbers. Consequently, is a negative definite matrix.

The stable condition can be satisfied as is ranged in and is ranged in for different signed networks. However, in Ref. [19], for different scales of signed networks, the values of and are adjusted in a very wide range.

4. The algorithm

The discrete-time based model for community detection in signed network (denoted as DTCDS) is further described in detail.

For a signed network, the connections should be positive and dense inside communities, and the connections should be negative and sparse between communities.[33] During the iteration of Eq. (10), the nodes with high positive similarities will evolve together, while the nodes with negative similarities will evolve far away. The algorithm flow of DTCDS is given as follows.

Input the adjacency matrix. According to Ref. [33], it has positive links inside communities and negative links between communities. The adjacency matrix of a synthetic network is input and Figure 4(a) gives the network structure

Calculate the similarity matrix. The similarity matrix is calculated by Eq. (8)

The values only keep two decimal places after the point for simplicity.

Generate the initial phases of the nodes. The initial phases ( ) are generated randomly in the range . Give three values for the parameters , , and α. The initial phases of the 10 nodes are expressed by a vector

The values only keep two decimal places after the point for simplicity.

Update the phases of the nodes. Update the phases of the nodes by Eq. (10), the terminal condition is

where in this paper.

Detect the community structure. Obtain the distribution curve of phases based on network model Eq. (10), as shown in Fig. 4(b). To better detect the communities, the stable phases are multiplied by 100. If the two nodes i and j satisfy the condition (a relative small positive number), the nodes are regarded as in the same community. It can be found that 10 nodes are divided into three communities, and they are , , and , respectively. This means that the community structure can be detected successfully according to the different phases based on our proposed method.

Fig. 4. (color online) Synthetic network. (a) The community structure detected by DTCDS. Triangle nodes are in the first community, square nodes are in the second community, and circle nodes are in the third community. (b) Phases of the nodes evolving with time.
5. Simulations

In this section, the proposed algorithm DTCDS is applied to both real world signed networks and synthetic signed networks. All the algorithms are coded in Matlab 2013a, the experiments have been executed on Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80 GHz, with 64 GB memory. The operating system is Windows Server 2008 R2 Enterprise.

5.1. Results on real world signed networks

The following real world signed networks are used to test the algorithm DTCDS.

U.S. supreme court (USC) network The USC network is about 9 justices’ voting behavior on the U.S. supreme court for the 2006–2007 term.[39] The justices are Stevens (1), Ginsburg (2), Souter (3), Breyer (4), Kennedy (5), Alito (6), Roberts (7), Scalia (8), and Thomas (9). Figure 5(b) gives the community detection results based on algorithm DTCDS.

Fig. 5. (color online) USC network. (a) The community structure detected by DTCDS. Square nodes are in the first community, and circle nodes are in the second community. (b) Phases of the nodes evolving with time.

In Ref. [39], it was shown that the USC network was partitioned into three communities or two communities. The topology structure is shown in Fig. 5(a). The difficulty to detect the community comes from Justice Kennedy. In fact, he has positive links with all the justices of the conservative wing (Stevens, Ginsburg, Souter, Breyer), and has three positive links and one negative link with the liberal wing (Alito, Roberts, Scalia, Thomas). Therefore, there would be two kinds of results. Based on DTCDS, the USC network has two communities: , and , as shown in Fig. 5(b). The line of node 5 is found a little far from the communities, sometimes it can be considered as a single community when there is no demand for the number of nodes in a community. The parameters are , , and .

Gahuku–Gama subtribes (GGS) network The GGS network came from a research about the cultures of New Guinea Highland, which is the political alliances and oppositions among 16 Gahuku–Gama subtribes.[40] Figure 6(a) shows that the political relationship among the subtribes can be positive or negative. Node 5 has two positive links with C 2, and one positive link with C 3. Node 13 has three positive links with C 2 and one positive link with C 3. It increases the difficulty to separate the communities C 2 and C 3. Through the DTCDS algorithm, 16 nodes are divided into three communities, and they are , , and . Comparing with Ref. [40], we can detect the communities successfully, which is shown in Fig. 6(b). The parameters are , , and .

Fig. 6. (color online) GGS network. (a) The community structure detected by DTCDS. Square nodes are in the first community, triangle nodes are in the second community, and circle nodes are in the third community. (b) Phases of the nodes evolving with time.
5.2. Results on synthetic signed networks

In this part, two kinds of synthetic networks are applied to test the efficiency of the DTCDS algorithm.

Small-benchmark synthetic network Two synthetic signed networks are widely used for tests,[29,32,33] which came from Ref. [11]. Figure 7 shows the topology structure of these two networks. They consist of 28 nodes, and have three communities. From Fig. 7(a), it can be found that the network has three communities. Figure 7(b) has 7 more negative links compared to Fig. 7(a), and it is unbalanced. The three communities in Fig. 7(a) are , , and . Based on the DTCDS algorithm, three communities are detected in the two networks, as shown in Fig. 8. The parameters are , , and .

Fig. 7. (color online) The community structure detected by DTCDS. Square nodes are in the first community, circle nodes are in the second community, and triangle nodes are in the third community. (a) Synthetic network 1. (b) Synthetic network 2.
Fig. 8. (color online) Phases of the nodes evolving with time based on DTCDS. C 1 is the community with square nodes, C 2 is the community with circle nodes, and C 3 is the community with triangle nodes. (a) Synthetic network 1. (b) Synthetic network 2.

Large-scale synthetic network The method to generate synthetic signed networks is introduced by Refs. [4, 33], and [41]. The synthetic signed networks are denoted as , where N is the number of nodes, g is the number of communities, p is the ratio of the number of inter-community positive links to that of the intra-community positive links, and q is the probability of randomly flipping the signs of the existing links. Three different size synthetic networks are generated as follows:

The results of the community detection for the signed networks (Fig. 9(a)), (Fig. 9(b)), and (Fig. 9(c)) are obtained. It can be found that has ten communities, also has ten communities, and has fifteen communities. The parameters are , , and .

Fig. 9. (color online) Phases of the nodes evolving with time based on DTCDS: (a) , (b) , (c) .

In this paper, the normalized mutual information (NMI) is adopted to evaluate the efficiency of the proposed algorithm when the ground truth of the network is known.[42] Two existing algorithms based on the dynamic mechanism are applied to compare with the proposed algorithm. One is the dynamic evolutionary clustering algorithm for signed networks (DEC),[33] and the other is the dynamic-based algorithm for signed social networks (DBAS).[29] They are algorithms recently proposed for signed networks and gain good performances. The two algorithms are efficient and can run on large-scale signed networks. For the above three synthetic signed networks, different values of p and q are given as and , respectively. Figure 10 shows the NMI corresponding to different p and q. The vertical axis NMI is the average of 50 runs.

Fig. 10. (color online) Comparing between DTCDS and existing algorithms: (a) , (b) , (c) , (d) , (e) , (f) , (g) , (h) , (i) .

From Fig. 10, it can be observed that the DTCDS algorithm outperforms DBAS and DEC when . DTCDS is relatively in balance with different q except for . The main reason is that the negative links would cause little effect on the detection performance with the same p, so parameter should be adjusted to advance the value of NMI. The detection results of DBAS and DEC are always lower than DTCDS, and they only obtain higher detection accuracy than DTCDS at , as shown in Fig. 10(i).

In addition, we compare the running time of these algorithms, as shown in Table 1. The running time is almost equal when the number of nodes is the same, so we only give three signed networks results, i.e., , , . It can be observed that the running time of our DTCDS algorithm is shorter than the DBAS algorithms. As the number of nodes increases, the running time is growing slowly, for instance, 1.3419 s for 500 nodes and 10.5984 s for 1000 nodes. But for the DBAS algorithm, the running time is growing quickly. DEC runs the fastest among the three algorithms, but it obtains the lowest NMI.

Table 1.

Comparing running time (in units of s) among DTCDS, DEC, and DBAS for .

.

In this section, the simulations are performed for the synthetic signed networks with a large number of nodes and abundant communities, which increase the difficulty on community detection, even though the DTCDS algorithm can obtain a higher NMI. For instance, as shown in Fig. 10, the NMI can reach 0.9834 when the number of nodes is 2000 and the number of communities is 15.

6. Conclusion

In this paper, a discrete-time-model-based method is designed for signed networks community detection. The similarity between nodes is used to make aggregation speed and separating rate quicker. The positive similarity makes two nodes evolve together easier, and the negative similarity makes two nodes evolve far away. Community structure can be discovered as the phases are stable based on the dynamic mechanism of the proposed network model. This algorithm is appropriate not only for the small-scale signed networks but also for large-scale signed networks. Moreover, this algorithm can also be applied to the networks with more communities, and it has better accuracy and runs faster than the compared algorithms. The accuracy of this algorithm is closely related to the similarity of the nodes, new similarity can be found to improve the accuracy, which needs further research.

Reference
1 Chen J Jiao L C Wu J S Wang X D 2010 Nonlinear Analysis: Real World Applications 4 3045
2 Jiang M S Chen Y X Chen L 2015 arXiv: 1502. 04380
3 Newman M J 2004 Phys. Rev. E 69 066133
4 Wu L T Ying X W Wu X T Lu A D Zhou Z H 2012 International Journal of Social Network Mining 1 91
5 Doreian P Mrvar A 1996 Social Networks 2 149
6 Lin D K 1998 ICML 98 296
7 Kernighan B W Lin S 1970 Bell System Technical Journal 2 291
8 Pothen A Simon H D Liou K P 1990 SIAM Journal on Matrix Analysis and Applications 3 430
9 Newman M E J 2004 Proc. Natl. Acad. Sci. USA 1 5200
10 Newman M E J 2006 Proc. Natl. Acad. Sci. USA 23 8577
11 Yang B Cheung W K Liu J 2007 Knowledge and Data Engineering IEEE Transactions on 10 1333
12 Chen Y Wang X L Yuan B Tang B Z 2014 Journal of Statistical Mechanics: Theory and Experiment 2014 03021
13 Sun L H Huang J B Tian Y Q Song Q B Liu H L 2015 Chin. Phys. B 24 551
14 Wang Y Cao J 2013 Nonlinear Analysis: Real World Applications 1 842
15 Wu J S Jiao L C Jin C 2012 Phys. Rev. E 1 016115
16 Wu J C Wang X H Jiao L C 2012 Physica A 3 508
17 Almendral J A Leyva I Li D Sendiña-Nadal Havlin S Boccaletti S 2010 Phys. Rev. E 1 016115
18 Li D Leyva I Almendral J A 2008 Phys. Rev. Lett. 16 168701
19 Wu J S Jiao Y 2014 Chaos 3 033104
20 Zhang L S Gu W F Hu G Mi Y Y 2014 Chin. Phys. B 10 626
21 Xiang J Hu K Zhang Y Hu T Li J M 2015 Europhys. Lett. 111 48003
22 Xiang J Hu T Hu K Tang Y N Gao Y Y Chai C H Liu X J 2015 Canadian Journal of Physics 93 418
23 Esmailian P Jalili M 2015 Scientific Reports 5 14339
24 Xiang J Tang Y N Gao Y Y Zhang Y Deng K Xu X K Hu K 2015 Physica A 432 127
25 Xiang J Hu T Zhang Y Hu K Li J M Xu X K Liu C C Chen S 2016 Physica A 443 451
26 Chang Z C Liu Y Yu H T Huang R Y 2015 Acta Phys. Sin. 21 218901 in Chinese
27 Pizzuti C 2008 Parallel Problem Solving from Nature - PPSN X 10th International Conference September 13–17. 2008 Dortmund Germany p. 1081
28 Liu C L Liu J Jiang Z Z 2014 IEEE Transactions on Cybernetics 12 2274
29 Wu J S Zhang L Li Y Jiao Y 2016 Physica A 443 568
30 LüL Y Zhou T 2011 Physica A 6 1150
31 Xiang J Hu K Zhang Y Bao M H Tang L Tang Y N Gao Y Y Li J M Chen B Y Hu J B 2016 Journal of Statistical Mechanics: Theory and Experiment 2016 033405
32 Chen J R Hong Z M Wang L N Wu L 2014 Chin. Phys. B 11 118903
33 Chen J R Wang H Wang L N Liu W W 2015 Physica A 447 482
34 Tang J Lou T C Kleinberg J 2012 WSDM 743
35 Easley D Kleinberg J 2010 Networks, Crowds, and Markets: Reasoning about a Highly Connected World 26 Cambridge Cambridge University Press p. 10
36 Khadivi A Rad A A Hasler M 2010 Circuits and Systems (ISCAS) Proceedings of 2010 IEEE International Symposium on IEEE 3777
37 Wu J S Hou Y T Jiao Y Li Y Li X X Jiao L C 2015 Physica A 433 218
38 Sideris T C 2013 Ordinary Differential Equations and Dynamical Systems New York Atlantis Press
39 Doreian P Mrvar A 2009 Social Networks 31 1
40 Read K E 1954 Southwestern Jouranal of Anthropopgy 10 1
41 Zeng Y Liu J 2015 Proceedings of the 18th Asia Pacific Symposium on Intelligent and Evolutionary Systems Heidelberg Springer p. 259
42 Wu F Huberman B A 2004 Eur. Phys. J. B 2 331